Fisher’s Exact Test (2×2) — Intuition + NumPy Implementation#
Fisher’s exact test answers a simple question:
Given a 2×2 contingency table, is there evidence that the two categorical variables are associated (not independent)?
It is especially useful when sample sizes are small (or expected counts are low), where large-sample approximations (like the chi-square test) can be unreliable.
What you’ll learn#
when Fisher’s exact test is the right tool
what “exact” means (conditioning on margins → hypergeometric distribution)
how the p-value is constructed for one-sided vs two-sided tests
a low-level NumPy-only implementation you can read end-to-end
how to interpret the result (and what it does not tell you)
Prerequisites#
basic probability (combinations)
null/alternative hypotheses + p-values
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.random.seed(42)
import sys
import plotly
print("python:", sys.version.split()[0])
print("numpy:", np.__version__)
print("plotly:", plotly.__version__)
python: 3.12.9
numpy: 1.26.2
plotly: 6.5.2
1) When to use Fisher’s exact test#
Use Fisher’s exact test when:
you have two categorical variables, each with two levels (a 2×2 table)
you want to test whether they are independent
counts are small (or expected counts are low), and you want an exact p-value
Common examples:
A/B tests: variant A vs B, conversion yes/no
clinical studies: treatment vs control, improved yes/no
survey analysis: group membership vs response category
Fisher’s exact test is valid for any sample size, but it’s most often chosen when the chi-square approximation is questionable.
2) The 2×2 table + hypotheses#
We’ll write the 2×2 table like this:
Outcome = 1 |
Outcome = 0 |
|
|---|---|---|
Group = 1 |
a |
b |
Group = 0 |
c |
d |
Null hypothesis (H₀): the variables are independent (equivalently, the odds ratio = 1)
Alternative (H₁) depends on the question:
greater: Group=1 has higher odds of Outcome=1 (odds ratio > 1)less: Group=1 has lower odds of Outcome=1 (odds ratio < 1)two-sided: any association (odds ratio ≠ 1)
# Example: treatment (1) vs control (0), success (1) vs failure (0)
treatment = np.array([1] * 10 + [0] * 6)
success = np.array([1] * 8 + [0] * 2 + [1] * 1 + [0] * 5)
a = int(np.sum((treatment == 1) & (success == 1)))
b = int(np.sum((treatment == 1) & (success == 0)))
c = int(np.sum((treatment == 0) & (success == 1)))
d = int(np.sum((treatment == 0) & (success == 0)))
table = np.array([[a, b], [c, d]], dtype=int)
table
array([[8, 2],
[1, 5]])
row_labels = ["Treatment", "Control"]
col_labels = ["Success", "Failure"]
fig = px.imshow(
table,
text_auto=True,
aspect="auto",
x=col_labels,
y=row_labels,
color_continuous_scale="Blues",
title="Observed 2×2 contingency table",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()
n = table.sum()
expected = np.outer(table.sum(axis=1), table.sum(axis=0)) / n
fig = px.imshow(
expected,
text_auto=".2f",
aspect="auto",
x=col_labels,
y=row_labels,
color_continuous_scale="Greens",
title="Expected counts under independence (H₀)",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()
3) Effect size: the odds ratio#
The odds ratio (OR) is a common effect size for 2×2 tables:
Interpretation:
OR = 1: no association (what H₀ asserts)
OR > 1: Group=1 is more likely to have Outcome=1 (positive association)
OR < 1: Group=1 is less likely to have Outcome=1 (negative association)
Fisher’s exact test gives you a p-value for the association. You usually report both the p-value and an effect size (like OR).
a, b, c, d = table.ravel()
num = a * d
den = b * c
odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)
odds_ratio
20.0
4) What makes it “exact”: conditioning on the margins#
The key idea behind Fisher’s exact test is conditioning on the margins (row sums and column sums).
For a 2×2 table, if the margins are fixed:
row sums: \(r_1 = a+b\), \(r_2 = c+d\)
column sums: \(c_1 = a+c\), \(c_2 = b+d\)
total: \(n = r_1 + r_2\)
…then the whole table is determined by just one number: the top-left cell \(a\).
Under \(H_0\) (independence) and given the margins, the distribution of \(a\) is hypergeometric:
So we can compute probabilities of all possible 2×2 tables with these same margins — exactly.
def log_factorials_upto(n: int) -> np.ndarray:
"""Return log(k!) for k=0..n as a NumPy array."""
n = int(n)
log_fact = np.zeros(n + 1, dtype=float)
if n >= 1:
log_fact[1:] = np.cumsum(np.log(np.arange(1, n + 1)))
return log_fact
def hypergeom_pmf_for_a_values(
a_values: np.ndarray,
*,
r1: int,
c1: int,
n: int,
log_fact=None,
) -> np.ndarray:
"""Hypergeometric PMF for A (top-left cell) given fixed margins."""
a_values = np.asarray(a_values, dtype=int)
r1 = int(r1)
c1 = int(c1)
n = int(n)
c2 = n - c1
if log_fact is None:
log_fact = log_factorials_upto(n)
def log_choose(n_: int, k_: np.ndarray) -> np.ndarray:
return log_fact[n_] - log_fact[k_] - log_fact[n_ - k_]
log_p = (
log_choose(c1, a_values)
+ log_choose(c2, r1 - a_values)
- log_choose(n, np.array(r1, dtype=int))
)
# Stabilize with log-sum-exp via a shift
log_p = log_p - np.max(log_p)
p = np.exp(log_p)
return p / np.sum(p)
a_obs = int(table[0, 0])
r1 = int(table[0, :].sum())
c1 = int(table[:, 0].sum())
n = int(table.sum())
c2 = n - c1
a_min = max(0, r1 - c2)
a_max = min(r1, c1)
a_values = np.arange(a_min, a_max + 1)
pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n)
np.column_stack([a_values, pmf])
array([[3.00000000e+00, 1.04895105e-02],
[4.00000000e+00, 1.10139860e-01],
[5.00000000e+00, 3.30419580e-01],
[6.00000000e+00, 3.67132867e-01],
[7.00000000e+00, 1.57342657e-01],
[8.00000000e+00, 2.36013986e-02],
[9.00000000e+00, 8.74125874e-04]])
def odds_ratio_for_a_values(a_values: np.ndarray, *, r1: int, c1: int, n: int) -> np.ndarray:
"""Compute odds ratios for all feasible tables with varying a and fixed margins."""
a_values = np.asarray(a_values, dtype=int)
r1 = int(r1)
c1 = int(c1)
n = int(n)
r2 = n - r1
b = r1 - a_values
c = c1 - a_values
d = r2 - c
num = a_values.astype(float) * d
den = b.astype(float) * c
or_ = np.full_like(num, np.nan, dtype=float)
mask = den != 0
or_[mask] = num[mask] / den[mask]
or_[~mask & (num > 0)] = np.inf
return or_
or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)
colors = np.where(a_values == a_obs, "#111111", "#636EFA")
fig = go.Figure(
go.Bar(
x=a_values,
y=pmf,
marker_color=colors,
customdata=np.column_stack([or_values]),
hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
)
)
fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
fig.update_layout(
title="All feasible tables (fixed margins) → hypergeometric PMF for a",
xaxis_title="a = count in the top-left cell",
yaxis_title="Probability under H₀ (conditional on margins)",
)
fig.show()
5) From the PMF to a p-value#
Once we have the probability of every feasible table (given the margins), we can define “extreme” outcomes.
One-sided p-values#
greater: sum probabilities of tables with a ≥ a_obs (more evidence of positive association)less: sum probabilities of tables with a ≤ a_obs (more evidence of negative association)
Two-sided p-value (common definition)#
For a two-sided test, Fisher’s exact test is discrete, so “two-sided” needs a precise definition.
A widely-used definition (including SciPy) is:
Sum probabilities of all tables whose probability is ≤ the observed table’s probability.
This produces a symmetric p-value that includes both tails.
p_obs = pmf[a_obs - a_min]
p_greater = float(pmf[a_values >= a_obs].sum())
p_less = float(pmf[a_values <= a_obs].sum())
p_two_sided = float(pmf[pmf <= p_obs + 1e-12].sum())
p_greater, p_less, p_two_sided
(0.024475524475524438, 0.9991258741258742, 0.03496503496503492)
6) Fisher’s exact test from scratch (NumPy-only)#
Below is a complete implementation of Fisher’s exact test for a 2×2 table.
It enumerates all feasible tables (via the feasible values of \(a\)).
It computes the hypergeometric probabilities in a numerically stable way.
It supports
greater,less, andtwo-sided.
def fisher_exact_numpy(table: np.ndarray, alternative: str = "two-sided", return_details: bool = False):
"""Fisher's exact test for a 2x2 contingency table (NumPy-only).
Parameters
----------
table : array-like, shape (2, 2)
Non-negative counts.
alternative : {'two-sided', 'greater', 'less'}
Defines the alternative hypothesis.
return_details : bool
If True, also return the enumerated support and PMF.
Returns
-------
odds_ratio : float
p_value : float
details : dict (optional)
"""
table = np.asarray(table, dtype=int)
if table.shape != (2, 2):
raise ValueError("table must be shape (2, 2)")
if np.any(table < 0):
raise ValueError("counts must be non-negative")
a, b, c, d = table.ravel()
r1 = int(a + b)
r2 = int(c + d)
c1 = int(a + c)
c2 = int(b + d)
n = int(r1 + r2)
# Sample odds ratio (effect size)
num = a * d
den = b * c
odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)
# Enumerate feasible values of a given fixed margins
a_min = max(0, r1 - c2)
a_max = min(r1, c1)
a_values = np.arange(a_min, a_max + 1)
log_fact = log_factorials_upto(n)
pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n, log_fact=log_fact)
p_obs = pmf[int(a - a_min)]
alt = alternative.lower().replace("_", "-").strip()
if alt in {"greater", "right", "right-sided", "right sided"}:
p_value = float(pmf[a_values >= a].sum())
elif alt in {"less", "left", "left-sided", "left sided"}:
p_value = float(pmf[a_values <= a].sum())
elif alt in {"two-sided", "two sided"}:
p_value = float(pmf[pmf <= p_obs + 1e-12].sum())
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
p_value = float(min(p_value, 1.0))
if not return_details:
return odds_ratio, p_value
details = {
"a_values": a_values,
"pmf": pmf,
"a_obs": int(a),
"p_obs": float(p_obs),
"margins": {"r1": r1, "r2": r2, "c1": c1, "c2": c2, "n": n},
}
return odds_ratio, p_value, details
for alt in ["greater", "less", "two-sided"]:
or_, p_ = fisher_exact_numpy(table, alternative=alt)
print(f"{alt:>9} | odds ratio = {or_:>6.3g} | p-value = {p_:.6f}")
greater | odds ratio = 20 | p-value = 0.024476
less | odds ratio = 20 | p-value = 0.999126
two-sided | odds ratio = 20 | p-value = 0.034965
# Optional: verify against SciPy (if installed)
try:
from scipy.stats import fisher_exact
for alt in ["greater", "less", "two-sided"]:
or_scipy, p_scipy = fisher_exact(table, alternative=alt)
or_np, p_np = fisher_exact_numpy(table, alternative=alt)
print(f"{alt:>9} | scipy p={p_scipy:.6f} | numpy p={p_np:.6f} | scipy OR={or_scipy:.3g}")
except Exception as e:
print("SciPy check skipped:", e)
greater | scipy p=0.024476 | numpy p=0.024476 | scipy OR=20
less | scipy p=0.999126 | numpy p=0.999126 | scipy OR=20
two-sided | scipy p=0.034965 | numpy p=0.034965 | scipy OR=20
7) Visualizing “extreme” tables (greater / less / two-sided)#
The plots below show which feasible tables are counted in the p-value.
Gray bars: feasible tables that are not counted
Red bars: feasible tables that are counted for that alternative
The vertical dashed line marks the observed value \(a_{obs}\)
def plot_pmf_with_rejection_region(details: dict, *, alternative: str) -> go.Figure:
a_values = details["a_values"]
pmf = details["pmf"]
a_obs = details["a_obs"]
p_obs = details["p_obs"]
r1 = details["margins"]["r1"]
c1 = details["margins"]["c1"]
n = details["margins"]["n"]
or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)
alt = alternative.lower().replace("_", "-").strip()
if alt in {"greater", "right", "right-sided", "right sided"}:
mask = a_values >= a_obs
p_value = float(pmf[mask].sum())
title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
elif alt in {"less", "left", "left-sided", "left sided"}:
mask = a_values <= a_obs
p_value = float(pmf[mask].sum())
title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
elif alt in {"two-sided", "two sided"}:
mask = pmf <= p_obs + 1e-12
p_value = float(pmf[mask].sum())
title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
colors = np.where(mask, "#EF553B", "#B0B0B0")
fig = go.Figure(
go.Bar(
x=a_values,
y=pmf,
marker_color=colors,
customdata=np.column_stack([or_values]),
hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
)
)
fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
fig.update_layout(
title=title,
xaxis_title="a (top-left cell)",
yaxis_title="Probability under H₀ (conditional on margins)",
)
return fig
_, _, details = fisher_exact_numpy(table, return_details=True)
for alt in ["greater", "less", "two-sided"]:
plot_pmf_with_rejection_region(details, alternative=alt).show()
8) How to interpret Fisher’s exact test#
What the p-value means here#
With Fisher’s exact test, the p-value is:
The probability (under H₀: independence) of observing a table at least as extreme as the one you saw, given that the margins are fixed.
So:
a small p-value suggests the observed association would be rare under independence → evidence against H₀
a large p-value means the data are not surprising under independence → not enough evidence to reject H₀
What it does not mean#
It does not say “the probability H₀ is true”.
It does not tell you the size of the association (use OR / risk ratio / risk difference for that).
A good report typically includes:
the 2×2 table
the odds ratio (effect size)
the p-value (and the chosen alternative)
9) A helpful sanity check: p-values under the null are discrete#
Because Fisher’s exact test works with a discrete distribution (the hypergeometric), the set of possible p-values is discrete.
If we repeatedly sample tables from the null (with the same fixed margins) and compute the two-sided p-values, you’ll see spikes rather than a perfectly uniform distribution. The test remains valid (it is typically conservative).
r1 = details["margins"]["r1"]
c1 = details["margins"]["c1"]
n = details["margins"]["n"]
c2 = n - c1
a_values = details["a_values"]
pmf = details["pmf"]
two_sided_p_for_each_a = np.array([float(pmf[pmf <= p_i + 1e-12].sum()) for p_i in pmf])
n_sims = 20_000
a_sim = np.random.hypergeometric(ngood=c1, nbad=c2, nsample=r1, size=n_sims)
p_sim = two_sided_p_for_each_a[a_sim - a_values.min()]
alpha = 0.05
print("Pr(reject at alpha=0.05) under H0 (empirical):", float(np.mean(p_sim <= alpha)))
fig = px.histogram(
p_sim,
nbins=30,
title="Two-sided Fisher exact p-values under H₀ (fixed margins)",
labels={"value": "p-value"},
)
fig.add_vline(x=alpha, line_color="#EF553B", line_dash="dash")
fig.show()
Pr(reject at alpha=0.05) under H0 (empirical): 0.03565
10) Pitfalls + practical notes#
Conditional on margins: Fisher’s test conditions on fixed row/column totals. In some study designs (e.g., case–control), margins are naturally fixed; in others they aren’t, but the test is still commonly used.
Two-sided definition: multiple “two-sided Fisher” definitions exist. Always specify which one you use (this notebook uses the common “probability ≤ observed probability” rule).
Zeros → infinite OR: if a cell is 0, the sample odds ratio can be 0 or ∞. That’s not “wrong”, but interpret carefully and consider reporting confidence intervals with appropriate methods.
p-value vs effect size: a tiny p-value can correspond to a small effect with large n; a large p-value can occur with a large OR but tiny n. Always look at the table and an effect size.
Multiple testing: if you run many Fisher tests, adjust for multiple comparisons.
11) Exercises#
Pick a different 2×2 table and compute Fisher’s exact p-values for
greater,less, andtwo-sided.Change the margins (row/column totals) while keeping the odds ratio similar — how does the p-value change?
For fixed margins, compute the PMF and plot it; identify which tables contribute to the two-sided p-value.
References#
Fisher, R. A. (1922). On the interpretation of χ² from contingency tables.
Hypergeometric distribution: see any standard probability text.
SciPy:
scipy.stats.fisher_exact(for a reference implementation).